{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第五章 Pandas入门"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.1 数据结构"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.1.1 创建Series Dataframe 对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "传入嵌套字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pandas import Series,DataFrame\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nevada</th>\n",
       "      <th>Ohio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>NaN</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001</th>\n",
       "      <td>2.4</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2002</th>\n",
       "      <td>2.9</td>\n",
       "      <td>3.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Nevada  Ohio\n",
       "2000     NaN   1.5\n",
       "2001     2.4   NaN\n",
       "2002     2.9   3.6"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop={'Nevada':{2001:2.4,2002:2.9},'Ohio':{2000:1.5,2002:3.6}}\n",
    "f1=DataFrame(pop)\n",
    "f1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### index的方法和属性<br>\n",
    "方法|说明\n",
    ":--|:--\n",
    "`append` |连接两个 `index` ，产生一个**新的 Index**                                                   \n",
    "`delete` |删除索引 `i` 处的元素，并得到**新的 Index**\n",
    "`is_monotonic` |是否单调递增\n",
    "`is_unique` |`index` 中没有重复项时，返回 `True`\n",
    "`unique` |计算 index 中唯一值的数组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 基本功能"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### reindex"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "obj=Series(['blue','yellow','red'],index=[0,2,4])\n",
    "obj2=obj.reindex(range(6),method='ffill')               "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### DataFrame\n",
    "传入一个列表，重新索引行<br>\n",
    "传入两个 list ,分别重新索引行列<br>\n",
    "重新索引列：`frame.reindex(columns=list)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "frame=DataFrame(np.arange(9).reshape(3,3),index=['a','b','c'],columns=['Ohio','Texas','California'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**对行进行索引**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Ohio</th>\n",
       "      <th>Texas</th>\n",
       "      <th>California</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Ohio  Texas  California\n",
       "a     0      1           2\n",
       "d     6      7           8\n",
       "b     3      4           5\n",
       "c     6      7           8"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2=frame.reindex(['a','d','b','c'],method=\"ffill\")  \n",
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "※ **用 method 的前提：**`index must be monotonic increasing or decreasing`\n",
    "<br>`method='ffill'`\n",
    "<br>所以d行的值，取自c行：`frame['d']=frame['c']`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3  col4  col5  col6\n",
       "a     0     1     2     2     2     2\n",
       "d     6     7     8     8     8     8\n",
       "b     3     4     5     5     5     5\n",
       "c     6     7     8     8     8     8"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.columns=[1,2,3]\n",
    "frame3=frame2.reindex(columns=[1,2,3,4,5,5],method='ffill')     #bfill 时返回 NaN\n",
    "frame3.columns=['col1','col2','col3','col4','col5','col6']\n",
    "frame3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "d    7\n",
       "b    4\n",
       "c    7\n",
       "Name: col2, dtype: int32"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame3.xs('col2',axis=1)  #xs方法，获得单行或单列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTES**\n",
    "1. `columns` 为有序值时，可用 `method` 确认`fill` 方式。\n",
    "2. 当传入两个列表同时 `reindex` 行列，`method` 只应用于 **行**\n",
    "3. 调用 `limit` 时，`columns/index` 必须是单调的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 丢弃值 `get/set values`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#丢弃行\n",
    "frame3.drop('a')\n",
    "#丢弃列\n",
    "new_frame=frame3.drop('col6',axis=1)   #drop返回新的DataFrame对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>-1</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3  col4  col5  col6\n",
       "a     0     1     2     2     2     2\n",
       "d     6     7     8     8     8     8\n",
       "b     3     4     5     5     5     5\n",
       "c     6     7     8     8    -1     8"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame3.set_value('c','col5',-1)\n",
    "frame3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 填充\n",
    "1. `reindex` `method` 填充： [`ffill`](#reindex)、`bfill`\n",
    "2. 值填充，`fill_value`\n",
    "3. `reindex` 时也可传入 `fill_value`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a     b     c     d   e\n",
       "0   0.0   2.0   4.0   6.0 NaN\n",
       "1   9.0  11.0  13.0  15.0 NaN\n",
       "2  18.0  20.0  22.0  24.0 NaN\n",
       "3   NaN   NaN   NaN   NaN NaN"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1=DataFrame(np.arange(12).reshape(3,4),columns=list('abcd'))\n",
    "df2=DataFrame(np.arange(20).reshape(4,5),columns=list('abcde'))\n",
    "df1+df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>15.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>19.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a     b     c     d     e\n",
       "0   0.0   2.0   4.0   6.0   4.0\n",
       "1   9.0  11.0  13.0  15.0   9.0\n",
       "2  18.0  20.0  22.0  24.0  14.0\n",
       "3  15.0  16.0  17.0  18.0  19.0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.add(df2,fill_value=0)    #fill_value的值传入df1/df1的NaN位置，两个同时为NaN的位置，仍为NaN"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**算术方法**\n",
    "\n",
    "\n",
    "方法|说明\n",
    ":--|:--\n",
    "add|加\n",
    "sub|减\n",
    "mul|乘\n",
    "div|除"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataFrame与Series之间的运算\n",
    "行广播运算(默认)\n",
    "```python\n",
    "df-series    #按行广播\n",
    "```\n",
    "列广播\n",
    "```python\n",
    "df.sub(series,axis=0)  #传入的轴是希望seriesd.index匹配的轴，匹配0轴后，在1轴上广播\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 函数应用与映射"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td>-1.345511</td>\n",
       "      <td>1.241060</td>\n",
       "      <td>-0.335556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td>0.581443</td>\n",
       "      <td>0.386226</td>\n",
       "      <td>2.126242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>-0.019844</td>\n",
       "      <td>-0.420891</td>\n",
       "      <td>-0.177956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td>-0.247697</td>\n",
       "      <td>-0.661138</td>\n",
       "      <td>1.690165</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               b         d         e\n",
       "Utah   -1.345511  1.241060 -0.335556\n",
       "Ohio    0.581443  0.386226  2.126242\n",
       "Texas  -0.019844 -0.420891 -0.177956\n",
       "Oregon -0.247697 -0.661138  1.690165"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame(np.random.randn(4, 3), columns=list('bde'),\n",
    "                  index=['Utah', 'Ohio', 'Texas', 'Oregon'])\n",
    "frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### **`apply applymap`**方法\n",
    "- apply()\n",
    " - `df.apply(func)`：作用在一行或一列上，默认0轴（一列上）\n",
    " - `Series.apply(func)`：作用在每个元素上\n",
    " \n",
    " \n",
    "- applymap()\n",
    " - 只用于`df`上，作用在每个元素上<br>\n",
    " like doing `map(func, series)` for each series in the DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Utah      2.586572\n",
       "Ohio      1.740016\n",
       "Texas     0.401047\n",
       "Oregon    2.351303\n",
       "dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f=lambda x: x.max()-x.min()\n",
    "frame.apply(f)\n",
    "frame.apply(f,axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-1.345511</td>\n",
       "      <td>-0.661138</td>\n",
       "      <td>-0.335556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>0.581443</td>\n",
       "      <td>1.241060</td>\n",
       "      <td>2.126242</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            b         d         e\n",
       "min -1.345511 -0.661138 -0.335556\n",
       "max  0.581443  1.241060  2.126242"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f3=lambda x: Series([x.min(),x.max()],index=['min','max'])\n",
    "frame.apply(f3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对每个元素进行格式化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td>-1.35</td>\n",
       "      <td>1.24</td>\n",
       "      <td>-0.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td>0.58</td>\n",
       "      <td>0.39</td>\n",
       "      <td>2.13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>-0.02</td>\n",
       "      <td>-0.42</td>\n",
       "      <td>-0.18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td>-0.25</td>\n",
       "      <td>-0.66</td>\n",
       "      <td>1.69</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            b      d      e\n",
       "Utah    -1.35   1.24  -0.34\n",
       "Ohio     0.58   0.39   2.13\n",
       "Texas   -0.02  -0.42  -0.18\n",
       "Oregon  -0.25  -0.66   1.69"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "format=lambda x: '%.2f' %x\n",
    "frame.applymap(format)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对 `df` 中的某一列进行格式化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.409225</td>\n",
       "      <td>-0.059035</td>\n",
       "      <td>-0.69</td>\n",
       "      <td>0.608010</td>\n",
       "      <td>-0.604662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-1.388213</td>\n",
       "      <td>-0.294574</td>\n",
       "      <td>1.03</td>\n",
       "      <td>1.250235</td>\n",
       "      <td>0.795981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.546295</td>\n",
       "      <td>0.716735</td>\n",
       "      <td>-0.68</td>\n",
       "      <td>0.875924</td>\n",
       "      <td>0.589751</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.635919</td>\n",
       "      <td>1.229887</td>\n",
       "      <td>-1.29</td>\n",
       "      <td>-0.920333</td>\n",
       "      <td>1.507943</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1      2         3         4\n",
       "0  0.409225 -0.059035  -0.69  0.608010 -0.604662\n",
       "1 -1.388213 -0.294574   1.03  1.250235  0.795981\n",
       "2 -0.546295  0.716735  -0.68  0.875924  0.589751\n",
       "3 -0.635919  1.229887  -1.29 -0.920333  1.507943"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ff=DataFrame(np.random.randn(20).reshape((4,5)))\n",
    "formater=lambda x: '%.2f' %x\n",
    "ff[2]=ff[2].apply(formater)\n",
    "ff"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 排序 `sort`\n",
    "Refer: [numpy排序](ch04_01.ipynb#排序)\n",
    "1. 按索引排序: \n",
    "```python\n",
    "Series.sort_index()\n",
    "Df.sort_index()\n",
    "Df.sort_index(axis=1)\n",
    "```\n",
    "2. 按值排序\n",
    "```python\n",
    "Series.order()\n",
    "Df.sort_index(by='column_name')   #will be deprecated\n",
    "Df.sort_values(by=['a','b'])    #先按a排序，a值相同再按b排序\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 排名 `ranking`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "返回元素在数组中的排名。<br>\n",
    "相同值的处理：\n",
    "\n",
    "method|说明\n",
    ":--|:---\n",
    "average|默认：Equal values are assigned a rank that is the average of the ranks of those values\n",
    "min|使用最小排名\n",
    "max|使用最大排名\n",
    "first|按出现顺序排名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     a    b    c\n",
       "0  2.0  3.0  1.0\n",
       "1  1.0  3.0  2.0\n",
       "2  2.0  1.0  3.0\n",
       "3  2.0  3.0  1.0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],\n",
    "                   'c': [-2, 5, 8, -2.5]})\n",
    "frame.rank(axis=1,method='first')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 汇总和计算描述统计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pandas.io.data` 模块已迁移到 `pandas_datareader` ( [`Doc`](http://pandas-datareader.readthedocs.io/en/latest/) )\n",
    "```python\n",
    "conda install pandas-datareader\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series.pct_change()\n",
    "<br>返回的Series：在 `index=2` 处：$$\\frac{value[2]-value[1]}{value[1]}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    NaN\n",
       "1    1.0\n",
       "2    0.5\n",
       "dtype: float64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ss=Series([1,2,3])\n",
    "ss.pct_change()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 协方差/相关系数\n",
    "`.corr` `cov` `corrwith`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.085949</td>\n",
       "      <td>-0.069787</td>\n",
       "      <td>-0.447926</td>\n",
       "      <td>0.685550</td>\n",
       "      <td>0.070329</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.762842</td>\n",
       "      <td>-0.938235</td>\n",
       "      <td>0.511513</td>\n",
       "      <td>0.727425</td>\n",
       "      <td>0.024499</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.083003</td>\n",
       "      <td>0.170441</td>\n",
       "      <td>0.267452</td>\n",
       "      <td>-0.749128</td>\n",
       "      <td>0.675760</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.964924</td>\n",
       "      <td>-0.499859</td>\n",
       "      <td>1.107623</td>\n",
       "      <td>-0.899374</td>\n",
       "      <td>3.016260</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.299568</td>\n",
       "      <td>1.035690</td>\n",
       "      <td>1.041145</td>\n",
       "      <td>-0.328281</td>\n",
       "      <td>0.326810</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1         2         3         4         5\n",
       "0 -0.085949 -0.069787 -0.447926  0.685550  0.070329\n",
       "1 -0.762842 -0.938235  0.511513  0.727425  0.024499\n",
       "2 -0.083003  0.170441  0.267452 -0.749128  0.675760\n",
       "3  0.964924 -0.499859  1.107623 -0.899374  3.016260\n",
       "4  1.299568  1.035690  1.041145 -0.328281  0.326810"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=DataFrame(np.random.randn(5,5),columns=[1,2,3,4,5])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.655582</td>\n",
       "      <td>0.645286</td>\n",
       "      <td>-0.645377</td>\n",
       "      <td>0.508995</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.655582</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.143303</td>\n",
       "      <td>-0.338167</td>\n",
       "      <td>-0.223311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.645286</td>\n",
       "      <td>0.143303</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.604673</td>\n",
       "      <td>0.565986</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.645377</td>\n",
       "      <td>-0.338167</td>\n",
       "      <td>-0.604673</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.718155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.508995</td>\n",
       "      <td>-0.223311</td>\n",
       "      <td>0.565986</td>\n",
       "      <td>-0.718155</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1         2         3         4         5\n",
       "1  1.000000  0.655582  0.645286 -0.645377  0.508995\n",
       "2  0.655582  1.000000  0.143303 -0.338167 -0.223311\n",
       "3  0.645286  0.143303  1.000000 -0.604673  0.565986\n",
       "4 -0.645377 -0.338167 -0.604673  1.000000 -0.718155\n",
       "5  0.508995 -0.223311  0.565986 -0.718155  1.000000"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "C=df.corr()\n",
    "C"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`df.corr()` 返回相关系数矩阵 $C$<br>\n",
    "\n",
    "$C_{ij}=$ `df[i].corr(df[j])` <br>\n",
    "\n",
    "$C_{ij}=C_{ji}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.65558222415170764"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[2].corr(df[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    0.645286\n",
       "2    0.143303\n",
       "3    1.000000\n",
       "4   -0.604673\n",
       "5    0.565986\n",
       "dtype: float64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.corrwith(df[3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 常用方法\n",
    "1. 唯一性\n",
    "2. 计数\n",
    "3. 成员资格"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "obj=Series(['a','c','d','a','a','b','c','c'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    3\n",
       "d    1\n",
       "c    3\n",
       "b    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.unique()                                       #唯一性\n",
    "obj.value_counts()                                 #计数\n",
    "pd.value_counts(obj.values,sort=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**成员资格**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a\n",
       "2    d\n",
       "3    a\n",
       "4    a\n",
       "dtype: object"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mask=obj.isin(['a','d'])\n",
    "obj[mask]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " **DataFrame.apply**(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)[source]\n",
    "\n",
    "    Applies function along input axis of DataFrame.\n",
    "\n",
    "    Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>qu1</th>\n",
       "      <th>qu2</th>\n",
       "      <th>que3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   qu1  qu2  que3\n",
       "1  1.0  1.0   1.0\n",
       "2  0.0  2.0   1.0\n",
       "3  2.0  2.0   0.0\n",
       "4  2.0  0.0   2.0\n",
       "5  0.0  0.0   1.0"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data=DataFrame({'qu1':[1,3,4,3,4],\n",
    "               'qu2':[2,3,1,2,3],\n",
    "               'que3':[1,5,2,4,4]})\n",
    "data.apply(pd.value_counts).fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 处理缺失数据\n",
    "### 填充确实数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-1.046038</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.880446</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.038886</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.153734</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-0.577060</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.772119</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-0.228833</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.189213</td>\n",
       "      <td>-1.483098</td>\n",
       "      <td>-1.541740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.139778</td>\n",
       "      <td>1.269114</td>\n",
       "      <td>-0.900236</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0 -1.046038       NaN       NaN\n",
       "1  1.880446       NaN       NaN\n",
       "2 -0.038886       NaN       NaN\n",
       "3 -0.153734       NaN -0.577060\n",
       "4 -0.772119       NaN -0.228833\n",
       "5  0.189213 -1.483098 -1.541740\n",
       "6  1.139778  1.269114 -0.900236"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=DataFrame(np.random.randn(7,3))\n",
    "df.iloc[:3,1:]=np.nan    \n",
    "df.iloc[[3,4],1]=np.nan\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 保留有效值个数2及2以上的行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.153734</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-0.577060</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.772119</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-0.228833</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.189213</td>\n",
       "      <td>-1.483098</td>\n",
       "      <td>-1.541740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.139778</td>\n",
       "      <td>1.269114</td>\n",
       "      <td>-0.900236</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "3 -0.153734       NaN -0.577060\n",
       "4 -0.772119       NaN -0.228833\n",
       "5  0.189213 -1.483098 -1.541740\n",
       "6  1.139778  1.269114 -0.900236"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dropna(thresh=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 不同的列填充不同的值\n",
    "传入一个 `dict`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-1.046038</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.880446</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.038886</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.153734</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.577060</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.772119</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>-0.228833</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.189213</td>\n",
       "      <td>-1.483098</td>\n",
       "      <td>-1.541740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.139778</td>\n",
       "      <td>1.269114</td>\n",
       "      <td>-0.900236</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0 -1.046038  1.000000  2.000000\n",
       "1  1.880446  1.000000  2.000000\n",
       "2 -0.038886  1.000000  2.000000\n",
       "3 -0.153734  1.000000 -0.577060\n",
       "4 -0.772119  1.000000 -0.228833\n",
       "5  0.189213 -1.483098 -1.541740\n",
       "6  1.139778  1.269114 -0.900236"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna({1:1,2:2})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 就地修改原对象\n",
    "`.fillna` 默认返回一个 `copy` \n",
    "\n",
    "就地修改：`.fillna(value,inpalce=True)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.5 层次化索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(a  1   -0.036478\n",
       "    2    0.696885\n",
       "    3    0.233812\n",
       " b  1    0.038440\n",
       "    2    1.558086\n",
       "    3    1.520032\n",
       " c  1   -0.408008\n",
       "    2   -0.272061\n",
       " d  2   -0.298656\n",
       "    3   -0.681329\n",
       " dtype: float64, MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]],\n",
       "            labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]]))"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data=Series(np.random.randn(10),index=[['a','a','a','b','b','b','c','c','d','d'],[1,2,3,1,2,3,1,2,2,3]])\n",
    "data,data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">ohio</th>\n",
       "      <th>cali</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>green</th>\n",
       "      <th>red</th>\n",
       "      <th>green</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td>0.997447</td>\n",
       "      <td>0.864915</td>\n",
       "      <td>0.895538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.286276</td>\n",
       "      <td>0.777191</td>\n",
       "      <td>0.331116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td>0.599630</td>\n",
       "      <td>0.987011</td>\n",
       "      <td>0.769337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.871370</td>\n",
       "      <td>0.589620</td>\n",
       "      <td>0.639870</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         ohio                cali\n",
       "        green       red     green\n",
       "a 1  0.997447  0.864915  0.895538\n",
       "  2  0.286276  0.777191  0.331116\n",
       "b 1  0.599630  0.987011  0.769337\n",
       "  2  0.871370  0.589620  0.639870"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame=DataFrame(np.random.rand(4,3),index=[['a','a','b','b'],[1,2,1,2]],columns=[['ohio','ohio','cali'],['green','red','green']])\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# m_index=pd.Multiindex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th colspan=\"2\" halign=\"left\">ohio</th>\n",
       "      <th>cali</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>green</th>\n",
       "      <th>red</th>\n",
       "      <th>green</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td>0.997447</td>\n",
       "      <td>0.864915</td>\n",
       "      <td>0.895538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.286276</td>\n",
       "      <td>0.777191</td>\n",
       "      <td>0.331116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td>0.599630</td>\n",
       "      <td>0.987011</td>\n",
       "      <td>0.769337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.871370</td>\n",
       "      <td>0.589620</td>\n",
       "      <td>0.639870</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state          ohio                cali\n",
       "color         green       red     green\n",
       "key1 key2                              \n",
       "a    1     0.997447  0.864915  0.895538\n",
       "     2     0.286276  0.777191  0.331116\n",
       "b    1     0.599630  0.987011  0.769337\n",
       "     2     0.871370  0.589620  0.639870"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.index.names=['key1','key2']\n",
    "frame.columns.names=['state','color']\n",
    "\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th colspan=\"2\" halign=\"left\">ohio</th>\n",
       "      <th>cali</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>green</th>\n",
       "      <th>red</th>\n",
       "      <th>green</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key2</th>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <th>a</th>\n",
       "      <td>0.997447</td>\n",
       "      <td>0.864915</td>\n",
       "      <td>0.895538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <th>a</th>\n",
       "      <td>0.286276</td>\n",
       "      <td>0.777191</td>\n",
       "      <td>0.331116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <th>b</th>\n",
       "      <td>0.599630</td>\n",
       "      <td>0.987011</td>\n",
       "      <td>0.769337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <th>b</th>\n",
       "      <td>0.871370</td>\n",
       "      <td>0.589620</td>\n",
       "      <td>0.639870</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state          ohio                cali\n",
       "color         green       red     green\n",
       "key2 key1                              \n",
       "1    a     0.997447  0.864915  0.895538\n",
       "2    a     0.286276  0.777191  0.331116\n",
       "1    b     0.599630  0.987011  0.769337\n",
       "2    b     0.871370  0.589620  0.639870"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.swaplevel('key1','key2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th colspan=\"2\" halign=\"left\">ohio</th>\n",
       "      <th>cali</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>green</th>\n",
       "      <th>red</th>\n",
       "      <th>green</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td>0.997447</td>\n",
       "      <td>0.864915</td>\n",
       "      <td>0.895538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.286276</td>\n",
       "      <td>0.777191</td>\n",
       "      <td>0.331116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td>0.599630</td>\n",
       "      <td>0.987011</td>\n",
       "      <td>0.769337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.871370</td>\n",
       "      <td>0.589620</td>\n",
       "      <td>0.639870</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state          ohio                cali\n",
       "color         green       red     green\n",
       "key1 key2                              \n",
       "a    1     0.997447  0.864915  0.895538\n",
       "     2     0.286276  0.777191  0.331116\n",
       "b    1     0.599630  0.987011  0.769337\n",
       "     2     0.871370  0.589620  0.639870"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>phase_def</th>\n",
       "      <th>phase_non_def</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>115</td>\n",
       "      <td>56.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>160</td>\n",
       "      <td>55.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>170</td>\n",
       "      <td>55.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>175</td>\n",
       "      <td>55.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>175</td>\n",
       "      <td>55.08</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   phase_def  phase_non_def\n",
       "1        115          56.50\n",
       "2        160          55.00\n",
       "3        170          55.10\n",
       "4        175          55.08\n",
       "5        175          55.08"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=DataFrame({'phase_def':{1:115,2:160,3:170,4:175,5:175},\n",
    "            'phase_non_def':{1:56.5,2:55,3:55.1,4:55.08,5:55.08}})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 转换成 `MultiIndex`\n",
    "\n",
    "`.unstack()` 方法生成新的对象\n",
    "\n",
    "`.swaplevel(0,1)`  生成新对象\n",
    "\n",
    "`.sort_index(level=0)`  生成新对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "phase_def      1    115.00\n",
       "               2    160.00\n",
       "               3    170.00\n",
       "               4    175.00\n",
       "               5    175.00\n",
       "phase_non_def  1     56.50\n",
       "               2     55.00\n",
       "               3     55.10\n",
       "               4     55.08\n",
       "               5     55.08\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2se=df.unstack()\n",
    "df2se"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex(levels=[['phase_def', 'phase_non_def'], [1, 2, 3, 4, 5]],\n",
       "           labels=[[0, 0, 0, 0, 0, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2se.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Number  phase        \n",
       "1       phase_def        115.00\n",
       "2       phase_def        160.00\n",
       "3       phase_def        170.00\n",
       "4       phase_def        175.00\n",
       "5       phase_def        175.00\n",
       "1       phase_non_def     56.50\n",
       "2       phase_non_def     55.00\n",
       "3       phase_non_def     55.10\n",
       "4       phase_non_def     55.08\n",
       "5       phase_non_def     55.08\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2se.index.names=['phase','Number']\n",
    "df2se_sw=df2se.swaplevel(0,1)        #生成新对象\n",
    "df2se_sw"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Number  phase        \n",
       "1       phase_def        115.00\n",
       "        phase_non_def     56.50\n",
       "2       phase_def        160.00\n",
       "        phase_non_def     55.00\n",
       "3       phase_def        170.00\n",
       "        phase_non_def     55.10\n",
       "4       phase_def        175.00\n",
       "        phase_non_def     55.08\n",
       "5       phase_def        175.00\n",
       "        phase_non_def     55.08\n",
       "dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2se_sw.sort_index(level=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.5.1 重排分级顺序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.5.2 据级别汇总统计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.5.3 使用DataFrame的列做索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.6 其他有关Pandas的话题"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.6.1 整数索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Series`<br>\n",
    "当索引中 *有整数* 时，根据整数进行数据切片的操作都是面向**标签**的。\n",
    "\n",
    "当索引中无整数时，根据整数进行数据切片的操作都是都是面向**顺序**的\n",
    "\n",
    "`DataFrame`<br>\n",
    "`df` 用 `loc` 进行标签索引，`iloc` 进行顺序索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>-1</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.866600</td>\n",
       "      <td>0.865950</td>\n",
       "      <td>0.890617</td>\n",
       "      <td>0.667237</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.487846</td>\n",
       "      <td>0.277667</td>\n",
       "      <td>0.106869</td>\n",
       "      <td>0.097285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.251482</td>\n",
       "      <td>0.665738</td>\n",
       "      <td>0.116685</td>\n",
       "      <td>0.137195</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.616372</td>\n",
       "      <td>0.854474</td>\n",
       "      <td>0.062562</td>\n",
       "      <td>0.243470</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.122070</td>\n",
       "      <td>0.087875</td>\n",
       "      <td>0.134753</td>\n",
       "      <td>0.273231</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1        -1         4         5\n",
       "0  0.866600  0.865950  0.890617  0.667237\n",
       "1  0.487846  0.277667  0.106869  0.097285\n",
       "2  0.251482  0.665738  0.116685  0.137195\n",
       "3  0.616372  0.854474  0.062562  0.243470\n",
       "4  0.122070  0.087875  0.134753  0.273231"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff=DataFrame(np.random.rand(5,4),columns=[1,-1,4,5])\n",
    "dff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.865950\n",
       "1    0.277667\n",
       "2    0.665738\n",
       "3    0.854474\n",
       "4    0.087875\n",
       "Name: -1, dtype: float64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>f</th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-1.086879</td>\n",
       "      <td>-1.109839</td>\n",
       "      <td>-0.075890</td>\n",
       "      <td>0.700439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.150916</td>\n",
       "      <td>-0.978938</td>\n",
       "      <td>0.471990</td>\n",
       "      <td>-0.783764</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.316800</td>\n",
       "      <td>-1.097486</td>\n",
       "      <td>-0.740895</td>\n",
       "      <td>1.310597</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.857949</td>\n",
       "      <td>-0.144822</td>\n",
       "      <td>-0.327928</td>\n",
       "      <td>0.539144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.499346</td>\n",
       "      <td>1.927254</td>\n",
       "      <td>0.431937</td>\n",
       "      <td>1.122111</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          a         f         b         d\n",
       "0 -1.086879 -1.109839 -0.075890  0.700439\n",
       "1  1.150916 -0.978938  0.471990 -0.783764\n",
       "2 -0.316800 -1.097486 -0.740895  1.310597\n",
       "3  0.857949 -0.144822 -0.327928  0.539144\n",
       "4  1.499346  1.927254  0.431937  1.122111"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff2=DataFrame(np.random.randn(5,4),columns=['a','f','b','d'])\n",
    "dff2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.700439\n",
       "1   -0.783764\n",
       "2    1.310597\n",
       "3    0.539144\n",
       "4    1.122111\n",
       "Name: d, dtype: float64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff2.iloc[:,-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.6.2 面板数据"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
